-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add Spark concat_ws function #8854
Conversation
✅ Deploy Preview for meta-velox canceled.
|
Found Spark supports arguments with String/Array type mixed used. Needs to enhance this patch. SELECT concat_ws(,, a, b, array(c, d), e);
"a,b,c,d,e" |
b2bca03
to
7efd217
Compare
3033eb8
to
a1f105b
Compare
Supported such case in the latest code. |
@rui-mo, could you take a review? |
379108a
to
8203b08
Compare
c2638ed
to
bbfa05e
Compare
@rui-mo, could you review again? Thanks! |
6651c0d
to
20df5f9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Looks good to me overall % some minors.
"concat_ws", | ||
std::vector<facebook::velox::exec::FunctionSignaturePtr>{ | ||
// Signature: concat_ws (separator, input,...) -> output: | ||
// varchar, varchar, varchar,.. -> varchar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to cover the case of mixed string and array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When array is enabled, the fuzzer test framework reports one issue when max_level_of_nesting=10. It seems not related to our implementation logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you show the exception? Maybe we should fix it first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See below error message. I run the test with --velox_fuzzer_enable_complex_types
. Seems this option has not been enabled for spark fuzzer test? cc @rui-mo
@ 0000000003cbcd71 folly::symbolizer::(anonymous namespace)::signalHandler(int, siginfo_t*, void*)
/root/PHILO/workspace/velox/deps-download/folly/folly/debugging/symbolizer/SignalHandler.cpp:453
@ 000000000001441f (unknown)
@ 0000000001f13a7a facebook::velox::exec::LocalSelectivityVector::~LocalSelectivityVector()
@ 000000000129f0f6 facebook::velox::exec::Expr::evalSimplified(facebook::velox::SelectivityVector const&, facebook::velox::exec::EvalCtx&, std::shared_ptr<facebook::velox::BaseVector>&) [clone .cold]
@ 00000000038ca794 facebook::velox::exec::Expr::evalSimplifiedImpl(facebook::velox::SelectivityVector const&, facebook::velox::exec::EvalCtx&, std::shared_ptr<facebook::velox::BaseVector>&)
@ 00000000038cb077 facebook::velox::exec::Expr::evalSimplified(facebook::velox::SelectivityVector const&, facebook::velox::exec::EvalCtx&, std::shared_ptr<facebook::velox::BaseVector>&)
@ 00000000038ca794 facebook::velox::exec::Expr::evalSimplifiedImpl(facebook::velox::SelectivityVector const&, facebook::velox::exec::EvalCtx&, std::shared_ptr<facebook::velox::BaseVector>&)
@ 00000000038cb077 facebook::velox::exec::Expr::evalSimplified(facebook::velox::SelectivityVector const&, facebook::velox::exec::EvalCtx&, std::shared_ptr<facebook::velox::BaseVector>&)
@ 00000000038cb1bd facebook::velox::exec::ExprSetSimplified::eval(int, int, bool, facebook::velox::SelectivityVector const&, facebook::velox::exec::EvalCtx&, std::vector<std::shared_ptr<facebook::velox::BaseVector>, std::allocator<std::shared_ptr<facebook::velox::BaseVector> > >&)
@ 0000000001fd4743 facebook::velox::test::ExpressionVerifier::verify(std::vector<std::shared_ptr<facebook::velox::core::ITypedExpr const>, std::allocator<std::shared_ptr<facebook::velox::core::ITypedExpr const> > > const&, std::shared_ptr<facebook::velox::RowVector> const&, std::optional<facebook::velox::SelectivityVector> const&, std::shared_ptr<facebook::velox::BaseVector>&&, bool, facebook::velox::fuzzer::InputRowMetadata const&)
@ 00000000013dd3a5 facebook::velox::fuzzer::ExpressionFuzzerVerifier::go()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PHILO-HE Would you open an issue so we can take a further look when enabling the velox_fuzzer_enable_complex_types
option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9687831
to
3d3e184
Compare
return totalResultBytes; | ||
} | ||
|
||
/// Initialize some vectors for inputs. And concatenate consecutive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// for private function, not revel to users
16adc62
to
c305598
Compare
@@ -253,6 +253,17 @@ static const std::unordered_map< | |||
/// them to fuzzer instead of hard-coding signatures here. | |||
getSignaturesForCast(), | |||
}, | |||
{ | |||
"concat_ws", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe include a note indicating that this signature is only for Spark SQL. We will separate the Presto and Spark signatures in #9949. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rui-mo, just updated. Thanks!
@@ -44,6 +44,9 @@ void registerSpecialFormGeneralFunctions(const std::string& prefix) { | |||
"cast", std::make_unique<SparkCastCallToSpecialForm>()); | |||
registerFunctionCallToSpecialForm( | |||
"try_cast", std::make_unique<SparkTryCastCallToSpecialForm>()); | |||
registerFunctionCallToSpecialForm( | |||
ConcatWsCallToSpecialForm::kConcatWs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose concat_ws
should be in RegisterString.cpp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yohahaha, seems all special form functions should be registered in RegisterSpecialForm.cpp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concat_ws
is a string function in spark, https://github.com/apache/spark/blob/13315eeec07e2aebcc05dfee762bbd060ae192ec/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala#L77, and you documented it in string.srt too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yohahaha, just updated this pr according to your suggestion. Thanks!
@xiaoxmeng, could you merge this pr if you have no comment? |
Friendly ping @xiaoxmeng. Could you merge this pr? The CI failure is not related. |
@bikramSingh91 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@bikramSingh91, thanks for importing the pr! I note internal test is red. If it's not related to this pr, could you merge it? |
Friendly ping @bikramSingh91, could you merge this pr if no other change is required? |
Hi @kgpai, can you merge this pr if you have no comment? |
@PHILO-HE Following up with @bikramSingh91 |
@bikramSingh91 merged this pull request in bddddf8. |
Summary: Add concat_ws Spark function which returns the concatenation for the input, separated by a separator (the first argument). It allows variable number of VARCHAR or ARRAY\<VARCHAR\> arguments. And these two types can be used in combination. This function is a bit similar to [ConcatFunction](https://github.com/facebookincubator/velox/blob/main/velox/functions/prestosql/StringFunctions.cpp#L140), except that `concat_ws` requires separator and supports using ARRAY<VARCHAR> type and mixed types. This PR is based on facebookincubator#6292 (author: unigof). There are a few bug fixes and improvements. Also made some changes to align with Spark. Doc [link](https://docs.databricks.com/en/sql/language-manual/functions/concat_ws.html). Pull Request resolved: facebookincubator#8854 Reviewed By: kgpai Differential Revision: D66898251 Pulled By: bikramSingh91 fbshipit-source-id: 1fcd193a245bea4062c4e20d1e1db9ad6cc3290b
Add concat_ws Spark function which returns the concatenation for the
input, separated by a separator (the first argument). It allows variable
number of VARCHAR or ARRAY<VARCHAR> arguments. And these two
types can be used in combination.
This function is a bit similar to ConcatFunction, except that
concat_ws
requires separator and supports using ARRAY type and mixed types.
This PR is based on #6292 (author: @unigof). There are a few bug fixes
and improvements. Also made some changes to align with Spark.
Doc link.